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Abstract 

In this paper, we consider first-order logic over unary functions and study the 
complexity of the evaluation problem for conjunctive queries described by such 
kind of formulas. 

A natural notion of query acyclicity for this language is introduced and we 
study the complexity of a large number of variants or generalizations of acyclic 
query problems in that context (Boolean or not Boolean, with or without inequal- 
ities, comparisons, etc.). Our main results show that all those problems are fixed- 
parameter linear i.e. they can be evaluated in time /(|(5|).|db|.|Q(db)| where |Q| is 
the size of the query Q, |db| the database size, |(5(db)| is the size of the output 
and / is some function whose value depends on the specific variant of the query 
problem (in some cases, / is the identity function). 

Our results have two kinds of consequences. First, they can be easily translated 
in the relational (i.e., classical) setting. Previously known bounds for some query 
problems are improved and new tractable cases are then exhibited. Among others, 
as an immediate corollary, we improve a result of [PY99] by showing that any 
(relational) acyclic conjunctive query with inequalities can be evaluated in time 
/(|Q|).|db|.|0(db)|. 

A second consequence of our method is that it provides a very natural descrip- 
tive approach to the complexity of well-known algorithmic problems. A number of 
examples (such as acyclic subgraph problems, multidimensional matching, etc..) 
are considered for which new insights of their complexity are given. 
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1 Introduction 



The complexity of relational query problems is an important and well-studied field of 
database theory. In particular, the class of conjunctive queries (equivalent to select- 
project-join queries) which are among the most simple, the most natural and the most 
frequent type of queries have received much attention. 

A query problem takes as input a database db and a query Q and outputs (5(db) 
the result of the evaluation of Q against db (when the query is Boolean, Q{dh) is simply 
yes or no). There exist mainly two ways to investigate the complexity of such a problem. 
In the combined complexity setting, one expresses the complexity of the problem in terms 
both of the database size |db| and of the query size \Q\ (and of the output size |(5(db)| 
if necessary). It is well-known that, in that context, the Boolean conjunctive query 
problem is NP-complete ([CM77, AHV95]). However, it is natural to consider that the 
database size is incomparably bigger than the query size and to express the complexity 
of the problem in terms of the database size only. In that case, the complexity of 
the conjunctive query problem falls down to P (and even less). However, as discussed 
by [PY99], that point of view is not completely satisfactory because although the 
problem becomes polynomial time decidable, the formula size may inherently occur in 
the exponent of the polynomial. Even for small values of this parameter, this may lead 
to non tractable cases. 

An interesting notion from parameterized complexity ([DF99]) that appears to be 
very useful in the context of query evaluation (see [PY99] ) is fixed parameter tractability. 
A (query) problem is said to be fixed-parameter (f.p.) tractable (resp. linear) if its time 
complexity is /(|(5|).P(|db|, |(5(db)|) for some function / and some polynomial P (resp. 
linear polynomial P). In that case, the formula size influences the complexity of the 
problem by a multiplicative factor only. Identifying the fragments of relational queries 
that are f.p. tractable for small polynomials P is then an important but difficult task. 
Surprisingly, a very broad and well-studied set of queries appears to lie within this 
class: as shown in [YanSl] (see also [FFG02] for a precise bound), ACQ, the acyclic 
conjunctive query problem (we refer to the standard notion of acyclicity in databases; for 
precise definitions see section 2.1) can be solved in polynomial time 0(|(5|-jdb|.|(5(db)|). 
Besides, it has been proved that evaluating an acyclic conjunctive query is not only 
polynomial for sequential time but also highly parallelizable (see [GLSOl]). 

A natural extension ACQ^ of ACQ allows inequalities between variables, i.e., atoms 
of the form x ^ y. In [PY99], it is shown that this latter class of queries is also 
f.p. tractable and can be evaluated in time 5(|(5|).|dbj.|(5(db)|. log^ |db| where g is an 
exponential function. Despite of these results, a lot of query problems including the 
extension of acyclic queries obtained by allowing comparisons of the form x < y are 
likely f.p. intractable as shown again by [PY99]. 

In this paper, we revisit the complexity of acyclic conjunctive queries under a different 
angle. First, a class of so-called unary functional queries based on first-order logic over 
unary functions is introduced. Focusing on the existential fragment of this language, 
we introduce a very natural graph-based notion of query acyclicity. We then show that 
various classes of relational conjunctive query problems can be easily interpreted in linear 
time by corresponding (unary) functional conjunctive query problems (see section 3): this 
is done by switching from the classical language describing relations between elements 
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of some domain D (i.e., the relational setting) to a functional one over the universe 
of tuples: unary functions basically describe attribute values. In this context, unary 
functional formulas can be seen as a logical embodiment of the well-known tuple calculus. 
A nice feature of the reduction is that it preserves acyclicity of queries in the two 
different contexts. The main part of the paper (section 5) is devoted to the analyze of 
the complexity of the query problem for a wide range of syntactically defined functional 
formulas. More precisely, whether inequalities (7^) are allowed or not, whether the query 
is Boolean or not or whether a restricted use of comparisons (<) is allowed are considered. 
In each case, we show that such queries can be evaluated in time /(|(5|).|db|.|(5(db)| 
(in time /(|Q|).|db| for the Boolean case) where the value of function / depends on the 
precise (functional) query problem under consideration. 

Coming back to the relational setting, as immediate corollaries, we obtain a substan- 
tial (and optimal) improvement of the bound proved in [PY99] for the ACQ^ problem 
and a new proof of the complexity of the ACQ problem. Moreover, we generalize the 
complexity bound for ACQ to a slightly larger class of queries denoted by ACQ''' that 
allow comparisons (<, <, 7^) in a restricted way. This should be compared with the 
result of [PY99] which shows that an unrestricted use of comparisons inside formulas 
leads to an intractable query problem. The results of this paper implies that, regardless 
of the query size, ACQ, ACQ^ and ACQ^ are inherently of the same data complexity. 

One can easily describe algorithmic problems by queries written in some language. 
This allows to reduce the complexity of these problems to the complexity of query eval- 
uations for the language. In section 8, this well-known descriptive approach is used 
for a number of algorithmic problems (like acyclic subgraph isomorphism, multidimen- 
sional matching, etc.). They are considered as well in their decision version as in their 
function or enumeration (of solutions) version. The variety of languages considered in 
the paper permits to express easily (i.e. without encoding) a large kind of properties 
(on graphs, sets, functions, etc.). Our results provides new insight on the complexity 
of these problems. In all cases, the best known (data) complexity bounds is at least 
reproved and sometimes improved. 

The methods we use to prove the main results of this paper are, as far as we know, 
original and quite different from those used so far in this context. They are essentially 
a refinement of the methods introduced in a recent technical report by Frederic Olive 
and the present authors (see [DGO04]): that paper essentially deals with hierarchies 
of definability inside existential second order logic in connection with nondeterministic 
linear time. As [DGO04] did before, we introduce here a simple combinatorial notion 
on unary functions called minimal sample (see section 4) and develop over this notion 
some technics of quantifier elimination in formulas that can be performed in linear time. 
Considering unary functions in the language permits the introduction of simple but 
powerful new logico-combinatorial methods (based on graphs mainly). Arguments for 
this are given here through the consequences on the complexity of relational acyclic con- 
junctive queries. There are possible other applications of the methods and the language; 
they are discussed in the conclusion. 
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2 Preliminaries 



The reader is expected to be familiar with first-order logic (see e.g. [EF99, Lib04]) but 
we briefly give some basic definitions on signatures, first-order structures and formulas. 

A signature (or vocabulary) cj is a finite set of relation and function symbols, each of 
which has a fixed arity which can be zero (0-ary function symbols are constant symbols 
and 0-ary relation symbols are Boolean variables). The arity of a is the maximal arity 
of its symbols. A signature whose arity is at most one is said to be unary. 

A (finite) structure S of vocabulary a, or cr-structure, consists of a finite domain D 
of cardinality n > 1, and, for any symbol s G cr, an interpretation of s over D (often 
denoted also by s, for simplicity). 

We will often deal with tuples of objects. We denote them by bold letters: for 
example, x = (xi, . . . , x^). If f is a /c-tuple of functions (/i, . . . , /fc), then f (x) stands 
for (/i(x), . . . , fk{x)). Analogously, if f and g are two fc-tuples of functions, f (x) = g(y) 
stands for the logical statement: /i(x) = (71 (y) A ... A fkix) = gkiu)- 

Let 93 = f{xi, . . . , Xfc) be a first-order formula of signature a and free variables among 
xi, . . . ,Xfc. Let va.i{(p) denote the set of variables of ip . Let £ be a class of first-order 
formulas (also called a query language). The query problem associated to C (and also 
denoted by C) is defined as follows: 

Input: A signature a, a cj-structure S of domain D and a first-order cj-formula ip{xi, . . . , Xk) 
of £. 

Output: The set Lp{S) =def {(ai, . . . ,ak) ^ : (5,ai, . . . ,afc) \= Lp{xi, . . . ,Xfc)}. 

In the following, query languages C are always specified by fragments of first-order 
logic. The query S ^ fi<S) is often identified with the formula if itself. 

In this paper, we consider two different kinds of signature a: either a contains 
relation symbols only or it contains relation and function symbols of arity at most one. 
In the first case, a is said to be relational, a c-structure will often be denoted by db 
and a a-query by Q. In the second case, a is said to be unary functional or, for short, 
functional, a c-structure is often denoted by J- and a u-query by (p. 

By making syntactic restrictions on the formula if, one may define a number of query 
problems. As we will see, the choice of the kind of signature has some infiuence also 
and we will define both relational query problems and the associated functional query 
problems. In what follows we briefly recall the basics about "classical" conjunctive 
queries and revisit this notion by introducing a new kind of functional query problem. 

2.1 Conjunctive queries 

Conjunctive queries can be seen as select-join-project queries (with renaming of 
variables). Logically speaking, they are equivalent to queries expressed by first-order 
relational formulas with existential quantification and conjunction, i.e., of the form: 

Q{yi, ... ,2/6) = 3X1 . --^Xa (/'(xi,. . . ,Xa,yi,.. ■ ,yb) 

where (p is a conjunction of atoms over some relational signature a and variables among 
X, y. If Q has no free variable the query is said to be Boolean. 
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Figure 1: The Hypergraphs of queries Qi and Q2 



Example 1 The two queries below are examples of conjunctive queries. 

Qiiyi,y2) = 3xi3x23x3 : R{xi,yi) A S{xi,y2,X2) AT{y2,X3) AR{xi,X2) 

Q2 = 3x13x23x33x43x5 : ^(xi, X2, X3) A ^(xi, X4, X5) A i?(x3, X5) 
Query Q2 is boolean. 

An important and well-studied class of conjunctive queries are the so-called acyclic 
conjunctive queries. To each conjunctive query Q one associates the following hyper- 
graph Hq = {V, E) : its set of variables \s V = var{Q) and its set of hyperedges is 
E = {var[a) : a is an atom of Q}- There exist various notions of acyclicity related 
to hypergraphs. We have to use the most general one that is defined as follows. A 
hypergraph is acyclic if one can obtain the empty set by repeating the following two 
rules (known as GYO rules, see [Gra79, Y079]) until no change occurs: 

1. Remove hyperedges contained in other hyperedges; 

2. Remove vertices that appear in at most one hyperedge. 

As usual (see [Fag83]), a query is said to be acyclic if its associated hypergraph is acyclic. 
Denote by ACQ the class of acyclic conjunctive queries. 

Example 2 The hypergraphs associated to queries Qi and Q2 of Example 1 are shoivn in Fig- 
ure 1. Applying GYO rules shows that Qi is acyclic and Q2 is cyclic. 

Conserving the same notion of acyclicity, one can enlarge this class of queries by 
allowing inequalities between variables (as defined in [PY99]). This defines the larger 
class of so-called ACQ^ queries. 

Example 3 Query below is an example of an ACQf query. 

Q-i{yi,y2) = 3xi3x23x3 : R{xi,yi) A 5(xi,?/2,X2) A T{y2,x-i) A R{xi,X2) 

Ayi 7^ a;3 A X2 / xi. 
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Figure 2: Tree decomposition of query Q4 



Alternatively, it is well-known that a conjunctive query Q{y) is acyclic if and only if 
it has a, join forest (called join tree in case the forest is connected), that is an acyclic graph 
Gq = iy, E) whose set of vertices V is the set of atoms of Q and such that, for each 
variable x that occurs in Q, the set of relational atoms where x occurs is connected 
(is a subtree) in Gq. Similarly, a conjunctive query Q{y) with inequalities is in ACQ^ 
if it has a join forest Gq. Note that Gq relies upon the relational atoms but does not 
take into account the inequalities. 

One obtains another natural generalization of acyclic queries by allowing comparison 
atoms X < y. As proved by [PY99] the evaluation problem of such queries is as difficult 
with respect to parameterized complexity as the clique problem (both are VK[l]-complete 
problems) and hence is similarly conjectured to be f.p. intractable. Surprisingly, we will 
show that for the following class of acyclic queries with (restricted use of) comparisons, 
denoted by ACQ"*", the evaluation problem is exactly as difficult, with respect to time 
complexity, as that of ACQ. A conjunctive query Q with comparisons, i.e., atoms of 
the form x9y where x, y are variables and 6 G {7^, <, <, >, >} is in ACQ"*" if 

1. it has a join forest Gq = {V, E) (defined as usual), 

2. for each comparison xOy of Q, either C i;flr(a), for some relational atom a 
of Q, or there is some edge (ce, /?) G £^ in Gq such that x € var{a) and y € var{f3), 
and 

3. for each edge (a, f3) E E, there is at most one comparison xOy in Q such that 
X G var{a) and y G var{(3). 

In other words, a conjunctive query with comparisons is in ACQ^ if it has a join forest 
Gq and if each comparison of Q relates two variables inside the same vertex of Gq or 
along an edge of Gq, with globally at most one comparison per edge. The reason for 
authorizing only one comparison per edge of the tree will be explain later in Remark 6. 

Example 4 Query Q4 below is in ACQ"*". 

Qiivi^yi) = 3x13x23x3 : R{xi,yi) A 5(xi,?/2,X2) A S{y2,X3,y2) A R{xi,X2) 

Ayi < y2 A xi > X3. 

Its join tree is shown in Figure 2. 
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Finally, as defined in [FFG02], a query Q{y) is said to be strict if there exists a 
relational atom a in Q such that y C var{a). We denote by ACQi, ACQ^^ and ACQf 
the restrictions of the classes of queries ACQ, ACQ'^, ACQ^, respectively, to strict 
queries. 

2.2 Conjunctive functional queries. 

In all this part, o" is a unary functional signature. In full generality, a conjunctive 
functional query is a conjunctive query over some unary functional signature a. More 
precisely, it is of the form: 

h k 
ip{y) EE 3x1 ... 3x6 : f\ Ti{zi) = rKti) A f\ Ui{Ti{vi)) 

i=l i=l 

with Zi,ti,Vi G var{ip), and t,t',t" are terms made of compositions of unary function 
symbols of a. For example, t{x) = fif2 ■ ■ ■ fki^)- Formulas are then interpreted on 
functional structures with totally defined unary functions. 

In this paper, formulas over a functional language are viewed as an analog of the 
well-knowm "tuple calculus". Then, for sake of clarity, we will adopt the following 
choices in the presentation (these choices do not restrict the applicability of our results 
to queries of the most general form. See also Remark 1). In what follows, structures are 
considered as multisorted unary algebras i.e. as a collection of partially defined unary 
functions. Let a = (Jj-el^^fun where a^ei contains unary relation symbols only and a fun 
contains unary function symbols. A d-structure J- will verify : 

• Its finite domain D is such that D is the union of all sets T € (Jrel- Also, for all 

Ti,T2earei, rinr2 = 0. 

• For each function / G (J fun, there is a collection Tj^, . . . ,Tj^, of sets in arei, such 
that / is defined over Ui<fc (and undefined elsewhere) and has value in D. 

This definition reflects the fact that each T G arei is seen as a set of tuples with 
each function / G (Jfun being a projection function from tuples to the domain D. The 
number of functions defined over T is equal to the arity of the underlying relation that 
T represents. 

For what concerns u-formulas two restrictions will be adopted in this paper. 

• Quantifications will always be relativized to some universe X G arei i-e. formulas 
are of the form (3x ^T)ip which is equivalent to 3x T(x) A (p. 

• All atoms are of the form x G T for T G a^ei or f{x) = g{y) for /, 5 G (Jfun U {Id} 
where Id is the identity function. Note that composition of functions is not allowed 
here. 

Example 5 Formula Lpi below defines afunctional conjunctive query. 
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ipi{x) = 3y e Ti,3z G T2 : 

fi{x) = gi{y) A f2{x) = hi{z)A 
/i(y)=52(^)A/i(z)//ii(y)A 
X G Ti 

As in the relational setting, one can define a notion of acyclic (unary) functional 
queries. The definition is even more natural and simpler since it relies upon graphs 
instead of hypergraphs. 

Definition 1 Let (f be a conjunctive functional query. The undirected graph = {V,E) 
associated to ip is defined by: V = var{(p) and for all distinct x,y G V, {x,y) ^ E iff cp 
contains at least one atom of the form f{x) = g{y) for some f,geaU {Id}. The query ip is 
acyclic if its graph G^p is acyclic. 

We denote by F-ACQ the class of acyclic (conjunctive) functional queries. Again, one 
may authorize the use of negation inside queries. We then denote by F-ACQ^ the class 
of acyclic functional queries ip whose atoms are of one of the three forms f{x) = g{y), 
f{x) 7^ g{y), or T{x), for /, 5 G o" U {Id} and T ^ a (recall that the notion of acyclicity 
relies upon equalities only). 

Example 6 The following query p>2 belongs to F-ACQ^. 

<fi{yi,yi)= 3x1 G ri,3x2 G T2,3x3 G r2 : 

f{xi) = f{yi) A g{xi) = X2 A g{xi) = f{y2) A 5(2/2) = 
AX3 / /(xi) Ag(yi) / f{y2). 

The associated graph ofp>2 is given in Figure 3. 

Similarly, let F-ACQ"*" denote the class of acyclic functional queries ip whose atoms 
are of the form /(x) = g{y) or f{x)Og{y) or C/(x), for /,5 G a U {Id}, U ^ a, and 
6 G {7^, <, <, >, >}, whose associated graph defined at Definition 1) is acyclic and 
for which the following holds: if f{x)Og{y) is a comparison atom of Q for two distinct 
variables x and y then (x, y) £ E and, conversely, for each edge (x, y) G E, there is at 
most one comparison f{x)9g{y) in Q. 

Example 7 Here is an example ofF-ACQ^ query. 

<f?.{yi,y2)= 3x1 G ri,3x2 G T2,3x3 G r2 : 

/(xi) = f{yi) A g{xi) = X2 A ^(xi) = f{y2) A 5(2/2) = 
A/(xi) < 5(2/2) A 7(2/2) > g{x^). 

Its associated graph is given in Figure 3. 

In analogy with the notion defined above in the relational setting, a functional query 
is said to be strict if it contains at most one free variable. We denote by F-ACQi, F-ACQf 
and F-ACQ_^ the restrictions of the three above defined classes of queries to strict 
queries. 
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Figure 3: Graph of queries ip2 (without comparisons) and 993 

In this paper, we will make extensive use of a class of queries defined, roughly 
speaking, as the complement of acyclic functional queries. Let F-FO be the class of 
first-order queries defined by universal formulas in conjunctive normal form over some 
(unary) functional signature o", i.e., formulas of the form: 

(^(y)=Vx: /\Q(x,y) 
i<k 

where each Ci is a clause, i.e., a disjunction of literals of the form {-^)f{z) = g[t) or 
U{z) for /,5 G a U {Id} and U ea. 

The negation of an F-FO formula is clearly a disjunction of conjunctive functional 
queries 3x -iCj. An F-FO query ip is said to be acyclic if each query 3x -iCj is acyclic. 
By definition, the acyclicity of an F-FO query can be read directly on each clause of the 
query by looking at inequalities f{z) ^ g{t) of the clause. The class of F-FO acyclic 
queries is denoted by F-AFO; its restriction to strict queries is obviously denoted by 
F-AFOi. 

Remark 1 In the formulas we consider, terms made of composition of functions are not au- 
tohrized at first sight. However, our results easily applies to this more general kind of formulas: 
for each term t{x) = fi . . . fk{x), one may add t as a new unary function sombol in the signa- 
ture and pre-computes T{x),for eaxh x e D,from fi,. . ., fk in linear time. In this way, one can 
obtain an equivalent query problem but without composed terms. Also, obviously, relativiza- 
tion and the use of partially defined functions do not play an essential role for what concerns 
the complexity residts presented here. 

2.3 Basic notions of complexity 

The model of computation used in this paper is the Random Access Machine (RAM) 
with uniform cost measure (see [AHU74, GS02, GO04, FFG02]). Basically, our inputs 
are first-order structures and first-order formulas. 

Let -E be a finite set or relation. We denote by card[E) the cardinality of E. Let [n] 
be the set {1, . . . , n}. A set of cardinality n is often identified with the set [n]. 

The size \I\ of an object / is the number of registers used to store / in the RAM. If 
E is the set [n], \E\ = card{E) = n. If i? C is a /c-ary relation over domain D, with 
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|D| = card{D), then \R\ = k.card{R): all the tuples (xi, . . . ,Xk) for which R{xi, . . . ,Xk) 
holds must be stored, each in a fc-tuple of registers. Similarly, if / is a A;-ary function 
from D'' to D, all values f{xi, . . . ,Xk) must be stored and |/| = \D\^. 

If 99 is a first-order formula, \(p\ is the number of occurrences of variables, relation 
or function symbols and syntactic symbols: 3, V, A, V, =,"(",")","," . For example, if 
if = 3x3y R{x,y) A = y) then \ip\ = 17. 

All the problems we consider in this paper are parameterized problems: each takes 
as input a list of objects / (e.g., a c-structure S and a formula ip) together with a 
parameter k (e.g., the size of cp) and outputs an object 5 (e.g. the result of the query 

A problem P is computable in time f{k).T{\I\, \S\) for some function / : N ^ if 
there exists a RAM that computes P in time (i.e., the number of instructions performed) 
bounded by f{k).T{\I\, \S\) using space i.e., addresses and register contents also bounded 
by /(A;).r(|/|, IS"!) ^. The notation Ofc(T(|/|, 151)) is used when one does not want to 
make precise the value of function /. 

Definition 2 Let T be a polynomial function. A property P is fixed-parameter tractable if it is 
computable in time f{k).T{\I\, \S\). When T is of the form T{n,p) = (n x p), P is said to be 
fixed-parameter linear. 

It is easy to see that one obtains the same complexity measure if instead of the 
uniform cost the logarithmic cost is adopted, i.e., if the time of each instruction is 
the number of bits of the objects it manipulates. E.g., if the "uniform" time (and 
space) complexity is Ofc(|/|, \S\) then the corresponding "logarithmic" time complexity is 
Ofc(|I|.|5'|.log(|/|.|5'|)) which is at most (and in fact less than) Ofe(|/|.log |I|.|5|.log |5|) = 
Oki\I\bit-\S\bit) where \I\bit = ©d-^l-log |/|) denotes the number of bits, i.e. the size in 
the logarithmic cost view, of the input /. 

3 Translating relational queries into functional queries 

The transformation of (acyclic) queries to be constructed in this section is very 
similar to the translation of the domain relational calculus into the tuple relational 
calculus in the classical framework of database theory. Although one needs to examine 
carefully all the details, the idea is very simple. 

We want to transform each input (a, Q, db) of a relational query problem into an 
input (cj', ipQ,J^^-^) of a (unary) functional query problem so that, among other things, 
Q(db) is some projection of the relation ^q{^^]j)- Let us describe successively the 
transformation of the structure and the corresponding transformation of the query. 

3.1 Transformation of the structure 

Let db be a relational a-structure db = {D; Ri, . . . , Rq) with each Ri of arity mj. For 
convenience and simplicity (but w.l.o.g.), assume that there is no isolated element in D, 

^This last restriction on addresses and register contents forces the RAM to use its memory in a 
"compact" way with space not greater than time. 
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i.e., for each x (z D, there exists i < q and some tuple t in Ri to which x belongs. Let 
m = maxi^qnii be the maximal arity among relations RiS. The associated functional 
cj'-structure is defined as follows: 

•^db ^ (-^'' ^' ^1' ■ ■ ■ ' ^91 /ii ■ ■ ■ ' fm) 
where the domain D' is the disjoint union of g + 1 sets D' = DUTiU . . .UTg where D 
is the domain of db and each Tj, 1 < i < g, is a set of elements identified to the tuples 
of Ri {card{Ti) = card{Ri)); each of D, T^, . . . , is a unary relation of each fj, 

1 < i < ni, is a unary function. 

Functions fj are defined as follows. For each Ri of arity rrii < m (1 < i < q) and for 
each t G Ti that represents the tuple (ei, . . . , CmJ of Ri, set fi{t) = ei, . . . , fm^(t) = e^^. 
Intuitively, each fj is the j*'' projection for each tuple, it is obviously defined on sets 
Tj that represents relations Ri with rrii > j, else it is undefined. Clearly, the functional 
structure Tdb encodes the whole database structure db. We first have to prove the 
following result. 

Proposition 1 The transformation db i-^ J^^y is computable in linear time 0{\db\). 

Proof. Since the transformation is immediate, we only have to prove that = 
0(|db[). It is essential to notice that each fj is defined and described on some sub- 
set of [ji<qTi SO that \fj\ = 0(E = 0{Y.i<q,j<raiCard{Ri)) and hence 
Ei<ml/il = 0{Y.i<qmi.card{R,)) = 0(E.<J^.I) = 0(|db|). Finally, = \D'\ + 

\D\~+l:^<,\Ti\+T.j<..\f^ = o{\dh\). - a 



3.2 Transformation of the query 

The transformation is essentially the same for all the variants (ACQ, ACQ^, etc) of 
acyclic queries. We present it here for ACQf. Let Q{yi, ■ ■ ■ ,yb) denote an ACQf 
query, i.e., a strict acyclic query with inequalities of the form: 

Q(yi, ...,yb) = 3xi... 3xa^'(xi, . . . , x^, yi, . . .,yb) 

with ^(x, y) = Ai<i<fc where the Ai's are relational atoms and / is a conjunction 

of variable inequalities v ^ v' for v,v' € {x, y}. 

By definition of the strict acyclicity, Q has a join forest F = {V,E) whose set of 
vertices is y = {Ai, . . . , Af^} so that y C var(Ai). We want to construct a conjunctive 
functional cr'-formula (pq whose graph G^pg is exactly the acyclic graph F. Roughly, the 
idea is to replace the k atoms ^i, . . . , by A: variables ti, . . . ,tk that represent the cor- 
responding tuples. For a relational atom A^, 1 < u < k, let vari{Au) denote the i^^ vari- 
able of Au'. e.g., if Au is the atom S{y2, xs,y2) then vari{Au) = var3{Au) = y2- As de- 
fined before, each function fj, j < m, of the functional structure J^^^y gives for each tuple 
t (of a relation of db) its j^^ field fj{t). E.g., the above equality vari{Au) = var3{Au) for 
Au = S{y2, xs,y2) is expressed by the formula fi{tu) = fsitu). The following functional 
formula essentially mimics the description of formula Q and of its forest F = (V, E): 

(^Q(t) = ^,ei(t) A ^i/(t) A ^s(t) A ^i{t). 
Each conjunct of lpq is described precisely as follows. 
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u<k ^i>u(*«) if the atom Au is of the form Rv^i- ■ •)• 

• ^y(t) is Au<fc where is nonempty if Au has at least one repeated variable 
and contains for each (repeated) variable that occurs at successive indices ji , . . . , jV 
of Au the conjunction /\-^^ fjA'^u) = fj,+i{tu)- 

• ^'_E(t) is A(A„ A„)G-B ^u,v where ^u,v contains, for each variable w that occurs both 
in Au and Ay with {Au,Au) G i?, one equality of the form fi{tu) = fj{tv) for two 
arbitrarily chosen indices i,j such that vari{Au) = w = varj{Ay). 

• ^/(t) is constructed as follows. For each inequality w ^ w' of I, choose (arbitrarily, 
again) two atoms Au and Ay so that w (resp. w') occurs in Au (resp. Ay) at index 
i (resp. j). Replace w ^ w' hy the inequality fi{tu) 7^ fj{tv)- Let ^/ be the 
conjunction of all those inequalities. 

Due to formula ^',.e/(t), each quantified variable is relativized to some domain Tj. 

Example 8 The following query: 

Q{yi,y2) = 3xi3x23x3 : Ri{xi,yi,y2) A i?2(a:2, xi, X2) A i?i(x2, X2, X3) Ayi^ X2, 

is translated into the formula (pqit), with t = (ti, t2, is)/ that is the conjunction of the follow- 
ing formulas: 

^rel{t)= Ti{ti)AT2{t2)ATi{t3) 
"fvit) = fl{t2) = f-i{t2) A fl{h) = /2(t3) 
^'E(t) = flih) = f2{t2) A /i(t2) = /l(t3) 
*/(t)= /2(ti)^/l(t2) 

Finally, it is easy to check that the following equality holds: 

Q{db) = {(/2(ii),/3(ti)) : h G D'and N 3t23i3¥^Q(t)}. 

In other words, Q{db) is the result of the projection t (/2(ii), fsiti)) applied to the relation 
fqi^f^lj)- Obviously, formula 3t23t3ipQ{t) is equivalent to the relativized formula: 

3ti e Ti, 3t2 G T2 : Ti(t3) A ^v{t) A ^ij(t) A ^/(t). 

More generally, the transformation process described before yields the following prop- 
erties. 

Lemma 2 Let Q{yi,...,yb) be a query in ACQf (resp. ACQ^, ACQ^, ACQ, ACQf, 
ACQ^). The following properties hold: 

1. € F-ACQt (resp. F-ACQ^ , F-ACQ^, F-ACQ, F-ACQt, F-ACQ+). 
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2. For each relational a-structure db, the result Q{db) (of query Q over db) is obtained by 
some "projection" of the relation 'fQ{^^\))- More precisely, there are two lists of indices 
ii,. . . ,ify and ji , . . . , jb such that ^ 

there exist ti, . . . ,1^ £ D' such that (^^/f,, t) \= ipQ{t)}. 

where yt = vari^ {Aj^)for h = 1, . . . ,b. 

3. \^q\ = 0{\Q\). 

Proof. For simplicity of notation, let us still assume that Q belongs to ACQf. 

1. By construction, the graph G^pg is (up to isomorphism) the join forest F (associ- 
ated to Q); this corresponds to the conjunct ^e- See also Remark 2 

2. By definition of the join forest F, the set of atoms where any fixed variable of Q 
occurs is connected in F. This implies that the conjunct ^y/^^E exactly expresses 
which variables the relational atoms Ai, . . . , Ak oi Q share. Moreover, correctly 
expresses the conjunction / of inequalities of Q. This proves that for each relational 
(T-structure db = {D; Ri, . . . , Rg) where = {D'; D,Ti, . . . , Tg, /i, . . . , fm) and 
for all y G Z)'', it holds: 

(db,y) hQ(y)i^ 

there exists ti, . . . , G D' such that 
(•^db' *) N <^Q(t) and fi^Xh) = Vh for each h = l,...,b. 

3. Let NbOcc denote the number of occurrences of variables in Q. It is easy to see 
that: 

l^yj + l^^l = 0{NbOcc). 

Clearly, we also have = 0{\I\) and \^rei\ = 0{k). That implies \lpq\ = 0{\Q\). 

□ 

Remark 2 Having a join forest F for Q is not necessary to construct the acyclic formula 
in ifQ. There is an alternative way to obtain an equivalent e using the GYO rules 
([Gra79, Y079]). Let Hq be the hypergraph associated to query Q. For each application of 
rule 2 ("remove vertices that appear in at most one hyperedge") nothing as to be done. However, 
each time rule 1 ("remove hyperedges contained in other hyperedge") is applied to some atom 
Au and A^, one proceed as for the original construction of'^E'- for each variable w that occurs 
in Au and Ay, one equality of the form fi{tu) = fj{ty) for two arbitrarily chosen indices i,j 
such that vari{Au) = w = varj{Ai,) is constructed. 

Applying these rules till H is empty ivill result in a new acyclic formula ^e- 

^In case Q(y) is a strict query with y C var(Ai) then ji — . . . = jb = 1. 
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4 Samples of unary functions 

In this section, some simple combinatorial notions about unary functions are defined. 
They may be seen as some kind of set/table covering problem. They will be essential 
for proving the main results of this paper. 

Definition 3 Let E, F be two finite sets and g = {gi, . . . ,gk) be a tuple of unary functions 
from E to F. Let P C [k] and {ci)i^p be a family of elements of F. (P, (ci)jgp) is said to be a 
sample o/g (indexed by P) over E if 

where g^^{ci) is the set ofpreimages of Ci by function g^. A sample is said to be minimal if, 
moreover, for all j G P: 

[J 9iHci). 

ieP\{j} 

Finally, if {P,{ci)i^p) is a sample (resp. minimal sample) of g over E, the family of sets 
(5j~^(cj))ieP is called a covering (resp. minimal covering) of E by g. 

Samples will often simply be denoted (ci, . . . , Ck) with Ci = '—' when i ^ P. 



Example 9 Let g = (51,52,53) &e the following tuple of unary functions over some do- 
main/table T with tuples a, b, c, d, e. 





91 


92 


53 


a 


1 


2 


4 


b 


1 


5 


1 


c 


3 


2 


4 


d 


3 


5 


3 


e 


5 


2 


4 



It is easily seen that the tuples (1, 2, 3), (1, 5, 4), (3, 2, 1) and (—,5, 4) are the samples ofg 
over T. Among them, (1, 2, 3), (3, 2, 1) and (— , 5, 4) are minimal. 



Remarks Let P C [k] and {ci)i^p be a sample of g over E. Then, there exists P' C P 
such that {ci)i^pi is a minimal sample of g over E. Informally, it is obtained by repeating the 
following steps as long as possible: 

- pick a j from P such that E = |Jjgp\^|^| 9i^{ci) 

-setP^P\{j}. 

Note that the only minimal sample ofg over the empty set is {—,—,... ,—) 
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In the rest of this section, problems about minimal samples are defined and their 
complexities are studied. Those problems will play a key role in the paper. 

Min-Samples 

Input: two finite sets E and F and a A:-tuple of unary functions g = 

{91, ■■■,9k) from E to F. 
Parameter: integer k. 

Output: the set of minimal samples of g over E. 

Lemma 3 Let E, F, g he an input of Min-Samples. 

1. There are at most kl minimal samples ofg over E. 

2. Problem Min-Samples can be solved in time Ok{\E\). 

Proof. Let us identify E with the set {1, . . . ,n}. We describe the construction of a 
tree T of depth n with at most k\ leaves and hence at most kl\E\ = Ok{n) nodes. The 
leaves represent all the minimal samples of g over E. Level i of the tree corresponds to 
element i of E. Each node x of level i is labelled by a subset C \k] and by a sample 
{Cj)j^px of g over {1, . . . , i} . The root r of the tree is labelled by P^' = 0. 

Let i = 1. There are at most k possibilities to cover element 1 with ghi^) = Ch 
{h = 1, . . . ,k). Then, the root of T has k children each labelled by (P^ft = {h}, c/j). 

At each level i {i = 2, . . . ,n), the same strategy is used. Let x be a node of level i — 1 
labelled by (P^, (c|)jgpa;). The set of children y of x labelled by (P^,(cpjgpy) will 
correspond to all the possibilities to extend the covering of {1, ... , by {P^, {Cj)j^px) 
in a minimal way in order to cover node i (if i is not already covered). 

Testing whether i is already covered, i.e., if i S Ujep^ 9j (Cj ) can be done in constant 
time Ofc(l): it suffices to test the disjunction \/ j^px gj{i) = c|. Two cases may occur: 

• Either i € UjeP^ case, node x has a unique child node y of level i 
with py = and Vj g P^, cj = c|. 

• Or i [jj^px g~^{cj). Two subcases may hold. 

— Either P^ = [k]. Then, it is not possible to cover element i, the construction 
fails and stops here for that branch. 

— Or P^ ^ [k]. Then, for each h G [k]\P^ , one constructs a child node y for x 
such that: P^ = P^ U {h}, cj = Cj for j G P^ and = ^^(i). Node y and its 
label can be constructed in constant time. 

That process ends after Ok{\E\) steps with a tree of size Ok{\E\) whose leaves rep- 
resent the (up to) k\ minimal samples of g over E: (ci,...,Cfc) with q = ' when 
i^P.^. □ 

''To be completely rigorous, we should mention that some of the (up to) k\ samples that our algorithm 
constructs may be not minimal. The essential property is that, by construction, each minimal sample 
(of g over E) is included in (at least) one of the constructed samples. The algorithm above should 
be completed by a variant of the algorithm of Remark 3 that extracts from a sample all the minimal 
samples that it contains. Note that the whole additional time required is Ofe(l) and that some minimal 
samples may be repeated. 
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We will also need a more elaborate problem about samples. Let / be an integer and 
f be a function from E io EK The image set of v is denoted by v{E). It is clear that 
the collection of sets v~^{a) for a = (ai, ... ,0;) G v{E) C E^ forms a partition of E. 
Let us define the following problem. 

Min-Samples-Partition 

Input: two finite sets E and F, two integers k and /, a /c-tuple of unary 
functions g = (51, . . . , g^) from E to F and a function v from 
E to 

Parameter: integers k and /. 

Output: for each a € f (-E) Q E\ the set M(a) of minimal samples of g 
over ?;~^(a). 

Lemma 4 Problem Min-Samples-Partition can be solved in time Ok,i{\E\). 

Proof. The algorithm is the following. 

1. Compute the set S = {{v{x),x) : x G E} in time Oi{\E\). 

2. Sort S by values of v{x): this computes the partition ('y~^(a))agt,{£;) of E in time 

OKI^I). 

3. For each a G v{E), compute the set M(a) of minimal samples of g over v~^{a) 
in time Ofc(|?;~^(a)|) (by Lemma 3). The total time required for this last step is 
Ofc(Ea6.(i=;) \v-\s.)\)=Ok{\E\). 

□ 



Remark 4 The "sampling" problems and their algorithms that are involved in Lemmas 3 and 4 
can be seen as generalizations of the well-known k-VERTEX COVER problem in graphs and 
its algorithm of parameterized linear complexity Ok{\G\) (see [DF99]). 

Let G = {V;E) he a graph and Tq = {D; fi, /2) be its functional representation : D = 
VUE and for each e € E, fi{e) and /2 (e) describe the endpoints ofe. Then C = {ci , . . . , } C 
V is a k-VERTEX COVER ofG if 

yxeEJi{x)=ciV ...y fi{x) = Ck V /2(x) = ci V . . . V 72(2;) = Cfc. 
In other words, (ci, . . . ,Ck,ci, . . . ,Ck) is a (/i, . . . , /i, /2, . . . , f2)-sampling ofV. 

5 The complexity of functional acyclic queries 

Roughly, the main technic of this paper shows among other things that it is possible 
to eliminate quantified variables in an acyclic conjunctive query (by transforming both 
the query and the structure) without overhead in the query evaluation process, i.e., so 
that evaluating the query so simplified is just as hard as evaluating the original query. 
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For the sake of clarity, the main result will first be stated in the context of F-AFO 
queries. Let us explain the method on a very simple example. Let ip be the following 
Boolean F-AFO query (without negation and only two variables): 



A first naive approach for evaluating ip against a given unary functional structure 
J- = (D; f , g) consists in testing the truth value of the matrix for any possible value of 
{x,y): that requires a time Ofc(|Dp). Alternatively, p can be interpreted as follows: for 
each value of x, the family of sets g~ {fi{x)), for i G {1, . . . , A;}, is a covering of D. In 
other words, for each x, there exists a sample {P, {ci)i^p) of g over D (with initially 
P = [k]) that "agrees" with values of f{x), i.e., such that: 



Such a sample can be chosen among minimal ones (recall Remark 3). Then, evaluating 
(f against J- can be done as follows. First, the set of the (up to) A;! minimal samples of 
g over D is computed. Then, for each x, it is looked for one of these minimal samples 
that satisfy Property 1. Because of Lemma 3, the whole process requires Ok{\D\) steps. 

With some more work, this basic idea can be extended to the general case of (non 
necessarily Boolean) acyclic queries where both equalities and inequalities are allowed. 
This is achieved through the main result that follows. 

Theorems The F-AFOi query problem can he solved in time f{\ip\).\D\ where D is the 
domain of the input structure T , ip is the input formula and f is a fixed function from N to M"*". 

Proof. Let a unary functional structure J- of domain D and a formula p>{x) = \/ycj){x, y) 
with y = (yi, . . . , yd) be the inputs of an F-AFOi query problem. Since clauses may be 
reduced independently, it can be supposed that (p{x) contains only one clause (/>. The 
proof is done by induction on the number of variables of p. 

Let G^p denote the acyclic graph associated to ip whose set of vertices is var{ip). 
Recall that dp only takes into account the inequalities of the clause (j). Without loss of 
generality, assume that is connected, i.e., is a tree T and choose x as the root of T. 
Order the nodes of T, i.e., the variables of p, by increasing levels from the root to the 
leaves, as yo = x,yi, . . . ,yd- Note that the restriction of T to the subset of variables 
{yo, yi, ... ,yi} for i < d is a subtree Tj where yi is a leaf. The variables of p except 
yo = X will be eliminated one by one according to this ordering. Let yi^, io < d — 1, he 
the parent of leaf yd in T. W.l.o.g., assume that ip is of the form ^: 



where y is now the d-tuple of variables {yo,yi, . . . ,yd-i) , the Uj, vj, for I < j < I, 
and fj, gj, for 1 < j < k are unary function symbols, ^(y) is an acyclic clause over 

*In case (p contains one-variable atoms of the form {-<)u{yd) = v{yd) or (-i)f/(j/d), we can easily 
replace them by two- variable positive atoms by expanding the signature and the structure by new unary 
functions computable in linear time 



ip = VxVy : fi{x) = c/i(y) V . . . V fk{x) = gk{y) 




(1) 



V'(yo) = Vyi . . . Vy^ : Y Vj{yd) / %(?/iJ V ^(y) V \/ gj{yd) = fj{yp,) 

3<l 3<k 



18 



y of associated graph = T^-i, and for each j < Q < pj < d — 1. Replacing our 
disjunction of negated atoms by an implication, one obtains: 

^{vq) = Vyi . ..^Vd-i^yd ■■ v(2/d) = u(?/ij {iIjIy) V \I gjiyd) = fjiVpj)) 

3<k 

where ^{vd) = u(yjg) stands for AjKi^jiVd) — '^jiVio)- Formula ip can be equivalently 
written as: 

y'(yo) = Vyi . ..yyd-i ■■ V'(y) V [Vy^ G v"^(u(yiu)) \l gj{yd) = fjiVp^)]. 

j<k 

The second disjunct states that {fj{ypj))j<k is a sample of g over v~^(u(yjo)) and 
hence contains such a minimal sample. 

The family M = {(6, M(u(6))) : b £ D} of the sets of minimal samples M(u(6)) = 
{(cj'(6), . . . , c^(&)) : 1 < /i < kl} of g over v~^(u(6)) is computed by Algorithm A below 
(since the number of minimal samples is only bounded by A;!, there may be repetitions 
of identical samples). 

Algorithm A: 

1. Compute the family A = {(a, M (a)) : a € v{D)} where M(a) is the set of minimal 
samples of g over v~^(a): this amounts to solve the problem Min-SampleS- 
PartitiON in time Ok,i{\D\) by Lemma 4 for E = F = D and u = v. 

2. Sort A in lexicographic order according to a. 

3. Compute and lexicographically sort the set B = {(u(6),6) : b S D} in time 

4. Merge the sorted lists A and B (in time Ok^i{\D\)) to compute the set 

C = {(u(6),M(u(6)),6) : beD} 

5. Return the family of sets (of minimal samples) M = {(6, Af (u(&))) : b € D} 

Hence, the time complexity of Algorithm A is Ok^i{\D\). In the set M, we are 
interested, of course, by the elements b for which Af (u(5)) is not empty. Let: 

K = {b: Af(u(&))/0}. 

We now eliminate variable yd by expanding the signature of the query and the 
structure (a classical method in quantifier elimination). New unary relations K, Sj and 
functions Cj (for h <k\ and j < k) are introduced. 

Functions are those that appear in the description of the minimal samples. Pred- 
icates are defined from as follows: for all j, h and y £ D, 

S!;iy) ^ y e K and c,^ / ' -' . 
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Let T' be the expansion of structure T defined as T' = {J^,{Sj,Cj)h<k\,j<k)- Let 
now (/?' be the following formula having d variables yo, . . . , Vd-i (recall io < d): 

ip'iVo) ^ Vyi . . . Vy,„i : V(y) V [K{yJ A \/ A (^i (^^o) ^ = c'J{y^,))] 

h<k\j<k 

The last part of formula ip' simply asserts that if j belongs to the index set of the h-th. 
minimal sample of g over v~^{u{yig)) then, fj{ypj) must be equal to the j-th value 
of the h-th. minimal sample. That means that {fj{ypj))j<k contains a minimal sample 
of g over v~^(u(yj(,)). It is then clear that, for each possible value a of 2; = y^, it 
holds (^, a) 1= <yj(x) ^ {^',a) \= ^'{x), that means (p{J^) = ip'{T'). Notice that the 
last part of formula ip' does not really introduce negative atoms: it can be rephrased 
as \/h<k\Aj<kiSj{yiQ) = V fjiyp^) = 4(yio)) where S!^ is now regarded as a unary 
function from D to {0, 1}. From the previous paragraphs, the two following facts also 
clearly hold. 

Fact 1 The expansion T' of structure T can he computed in time Ok^ii\T)\). 

Fact 2 formula ip'{x) can he easily transformed into a conjunction of acyclic clauses each hav- 
ing d variables and associated tree Td^i- 

By iterating this process d times, i.e., eliminating successively variables yd, yd-i, ■ ■ ■ ,yi, 
one obtains in time 0^(|D|) an expansion of J-" and a quantifier- free formula (p'{x) 
with only one variable x = y^. It is clear that the final query (p'(J^') = {a G D : {J-' , a) \= 
f'{x)} = f{J-) can be computed in linear time Odi^'j-I-Dl). □ 

Remark 5 (On the constant value /(| In the worst case, the value of f{\(p\) may be 

huge: each elimination step may introduce a number of new atoms bounded by k\ (and requires 
to put the new formula in conjunctive normal form for the next step). 

A very interesting particular case concerns F-AFOi-queries without positive atoms (closely 
related to the ACQ^ problem). In that case, formida Lp{x) is of the following form, for some 
io < d: 

ip{x) = Vyi . . . yyd-i^yd ■■ Vjxi Vj (yd) 7^ uj [yi^ ) V V(y) 
= Vyi . ..yyd-i : V'(y) V -Syd{^{yd) = ^{Vio)) 

with y = (yo, • • • , Vd-ij- li easy to see that one can compute, in time 0{l.\D\), the set Dq of 
elements y G D such that u(y) G v(L') (i.e., such that there exists yd with v{yd) = u(y)). By 
enlarging the signature, the formida ip can be transformed into an equivalent formula without 
variable yd (also denoted by ipfor convenience): 

ip{x) = Vyi . . . Vyd_i : V(y) V -^Do(yiJ 

Note that, although a new atom L'o(yjo) has been introduced, the sum of the number of quan- 
tifiers plus the number of literals of ip has been decreased by I. Summing up the costs of all the 
steps, it yields that F-AFOi-queries without positive atoms can be evaluated in time 0{\ip\.\D\). 
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We are now able to state the consequences of our results, first in the context of 
acyclic conjunctive functional queries. 

Theorem 6 The query problem F-ACQf (resp. F-ACQi) can be solved in time f{\{p\).\D\ 
(resp. 0{\ip\.\D\)). 

Proof. Let J- and ipix) be inputs of the F-ACQf (resp. F-ACQ]^) problem. By defi- 
nition, ^ip{x) defines the F-AFOi query (resp. F-AFOi query without positive atoms) 
whose output is \ f{J')- By Theorem 5, Remark 5 and the fact that f{J-) can be 
computed from D \ is time 0(1-01), we are done. □ 

For what concerns F-ACQ^ queries, the following result can be proved. 

Theorem 7 The query problem F-ACQ^ can be solved in time 0{\lp\.\D\) 

Proof. The proof, that is a generalization of the proof for F-ACQj^, is similar and, in 
several aspects, is simpler than that of the similar result for F-ACQf. Let us mention 
essentially the differences. W.l.o.g., let (p G F-ACQ]^ be a formula of the form 

ip{x) = 3yi... 3yd^i3yd : ^'(yo, yi, • • • , Vd-i) A u(yi J = v(yd) A f{yio)09{yd) A -/{yd) 

where yo is x, 6 £ {7^, <, <, >, >}, < -io < d — 1, 7(yd) is a quantifier-free formula on 
the unique variable yd, and u(yj„) = v{yd) stands for /\j<iUj{yif^) = Vj{yd). Formula if 
can be equivalently written as: 

(fix) = 3yi... 3yd~i : ^'(y) A 5{yia) 
where y = {yo,yi, . . . , yd-i) and 6 is the following two- variable formula: 

5{y) = 3z : 7(2) A u{y) = v{z) A f{y)9g{z) 

The key point is the following: 

Lemma 8 The set Dq = = {a e D : {J^, a) \= S{y)} is computable in time 0{\6\.\D\) 

Proof. The set B = = {b G D : [F, h) \= 7(2;)} is obviously computable in time 

0(171.11)1). Assume that the comparison symbol ^ is < (the other cases are variants of 
this case). Now, compute and lexicographically sort the following lists of (/ + 3)-tuples 
(in time 0{l.\D\)): 

y = {(u(2/),/(y),l,y) :yGl)}, and 

Z = {(v(z),(7(^),0,z) :zGi?}. 

Then, merge the sorted lists Y , Z into the sorted list L. It is easy to see that the 
following fact holds: 

Fact 3 5(JF) is the set of elements y G D such that there exists z G B such that u{y) = \-{z) 
and (u(y), f{y), 1, y) occurs before {v{z), g{z),0, z) in L. 
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Using this fact, the fohowing algorithm computes 6{J-) (knowing set B) in time 
0{l.\D\). 

• Partition the sorted hst L into nonempty (sorted) subhsts L(a), for a € u(Z)) U 
v(i?), according to the first /-tuple u(y) = a or v(z) = a of each tuple. 

• In each sorted list L(a), compute the last tuple, denoted by MaxB(a), of the form 
{^{z) = a, (7(2;), 0, z), z G B, with g{z) maximal if such element exists. Otherwise, 
set Maxsi^) — — oo- 

• In each L(a), compute the list -^^^(a) of the tuples of the form (u(y) = a, /(y), 1, y) 
that occur before Maxsi^) ^i^)- convention, -L^(a) is empty in case 
Maxsisi) = —00. 

• Return the set of elements y that appear in the lists L^(a). By Fact 3, this is 
clearly the required set 6{T). 

Globally, 5(J") is computed in time 0((|7| + /).|i:>|) = 0{\6\.\D\). This proves the 
lemma. □ 

End of proof of Theorem 7: Let J-' be the expansion of structure J- defined as JF' = (J^, Dq) 
where Dq is the unary predicate defined as Dq = 6[T). Let if' denote the following 
formula, of signature expanded with Dq: 

^'{x) = 3yi... 3yd_i : ^'(x,yi, . . . ,yd-i) A D{yi^^) 

where yi^ S {x, yi, . . . , yd~i\- By construction, we have: 

Fact 4 ^{T) = if'iJ^'). 

In order to simply compare the lengths of (f and (f' , let us introduce a simplified 
notion of formula length: let \<p\s denote the number of quantifiers of ip plus its number 
of occurrences of atoms. Clearly, it holds: \ip\ = Q{\<p\s)- By construction, we get the 
following fact: 

Fact 5 \ip'\s = \^\s - \S\s + L 

Lemma 8 immediately yields the following: 

Fact 6 The expansion T ^ T' , i.e., the computation of the added unary relation Dq = 5{!F) 
is computed in time 0{\6\s.\D\). 

Iterating the transformation {J-,ip) {J-',{p') d times allows to eliminate suc- 
cessively the quantifed variables y^-i, . . . , yi; this can be performed in total time 
0((|99|s + d).\D\) by Facts 5 and 6, and hence in time 0(|(^|.|D|) as required. This 
completes the proof of the theorem. □ 
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Remark 6 Allowing more than one comparison along the edges of the tree decomposition leads 
to a class of queries that seems intrinsically non-linear. Let's consider the very simple following 
formula with two comparisons: 

3x3y : fi{x) < gi{y) A f2{x) < g2{y). 

Finding two satisfying witnesses x and y, amounts to find lexicographically ordered pairs 
f2{x)) and {gi{y),g2{y)) which seems not doable in linear time (even if "tables" (/i, /2) 
and (51, 52) already sorted). 

The following theorem states the complexity of our (functional) acyclic queries in 
the general case. 

Theorem 9 The F-ACQ^ (resp. F-ACQ, F-ACQ^) query problem can be solved in time 
f{\<p\).\D\.\ip{J^)\ (resp. 0{\{p\.\D\.\(p{T)\)) for some function f. 

Proof. We prove that, for any function / : N 1— > M"*", if problem F-ACQ|,^ (resp. 
F-ACQf ) can be solved in time /(|(/9|).|Z)| then problem F-ACQ*^ (resp. F-ACQ^) can 
be solved in time /(|9j|).|i5|.|(/?(^)| for the same function /. Combined with Theorem 6 
and 7, this yields the desired result. 

Let ^ be a functional structure and <f{xi, . . . , x^) be a formula for the query problem 
F-ACQ+ or F-ACQ^. For i = 1, . . . , A;, let: 

Ei = {(xi, . . . ,Xi) e D' : {T,xi, ... ,Xi) \= 3xi+i . . . Bxktp} 

Obviously, (p{T) = E^. Sets £'1, E^ are computed inductively by the following 
algorithm that only evaluates strict acyclic queries as subroutines. 

El ^ {xi e D : {T, xi) 1= 3x2 . . . 
For i from 2 to do 

Ei^^ 

For all (xi, . . . , Xj_i) e Ei^i do 

S^{xieD: (jr,a;i, . . . ,Xi) ^ 3xi+i . . . 3xfcV?} (*) 

E'i ^ E'i U {(xi, . . . ,Xj_i,Xi) : Xj G S} 
End 
End 

^{T) ^ Ek 

The main step of the algorithm, that is step (*), requires time /(It^D-lDl. It is repeated, 
a number of times bounded by: 

card{Ei) + card{E2) + . . . + card{Ek) < k.card{Ek) = \Ek\ = |v?(-F)| 
This yields total time f{\Lp\).\D\.\Lp{J')\. □ 

Finally, let us give another consequence of our results in the functional setting. 
Any two-variable quantifier-free (CNF) functional formula 'ijj{x,y) is acyclic because 
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any undirected graph with at most two vertices is acychc. Let F-FO*"^'' denote the set 
of functional first-order formulas (not necessarily in prenex form) with only two variables 
x,y which may be quantified several times. Denote by F-FO^i'"'^^ its restriction to strict 
queries. 

Corollary 10 The F-TO\°'^'^ query problem is computable in time 0^{\T\). 

Proof. The proof is done by induction on the structure (i.e., subformulas) of the input 
formula if by using Theorem 5. □ 

6 Application to the complexity of relational acyclic queries 

In the context of " classical" , i.e., relational conjunctive queries, Theorem 9 immediately 
yields the following improvement of the time bound (7(|(5|).|db|.|(5(db)|. log^ |db| (for 
some function g) proved by [PY99] for the complexity of acyclic queries with inequalities. 

Corollary 11 TheACQf (resp. ACQf) query problem can besolved in time f {\Q\) .\dh\.\Q{dh)\ 
(resp. f{\Q\).\db\) where Q is the input query and db is the input database. 

Proof. This comes from Theorem 9 and from the fact that the class ACQ^ can be 
linearly interpreted by the class F-ACQ^ (see section 3). □ 

Another consequence of Theorems 6 and 9 is an alternative proof of the following 
well-known result of [YanSl] (see also [FFG02]) that we slightly generalize since now 
also restricted comparisons are allowed. 

Corollary 12 The ACQ and ACQ^ (resp. ACQi and ACQf) query problems can be solved 
in time 0{\Q\.\db\.\Q{db)\) (resp. 0(\Q\.\db\)). 

In a two-atom query each database predicate appears at most two times. These 
kind of queries have been studied in [KVOO, Sar91] mainly in the context of query- 
containment. A consequence of Corollary 10, is the following. 

Corollary 13 Any two-atom conjunctive query with inequalities can be evaluated in time 
0^p(\db\) i.e. in time 0^p(\Ti \ + \T2\) where Ti and T2 are the two input tables. 

7 Enumeration of query results 

For all kind of queries considered in this paper, the complexity of the evaluation 
process can be done in time /(|(5|).|db|.|Q(db)|. In other words, coming back to data 
complexity, this is equivalent to say that there exists a polynomial total time algorithm 
(in the size of the input and the output) that generates the output tuples. It is natural 
to ask whether one can say more on the efficiency of this enumeration process. This 
could be justified, for example, in situation where only parts of the results are really 
needed quickly or when having solutions one by one but regularly is required (e.g., in 
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order to be tested by an other procedure that runs in parallel). Some remarks on this 
subject are sketched in this section. 

One of the most widely accepted notion of tractability in the context of generation 
of solutions is the following. A problem P is said to be solvable within a Polynomial 
(resp. linear) Delay if there exists an algorithm that outputs a first solution in polynomial 
(resp. linear) time (in the size of the input only) and generates all solutions of P with 
a polynomial (resp. linear) delay between two consecutives ones (see [JYP88] for an 
introduction to complexity measures for enumeration problems). Of course, a Polynomial 
Delay algorithm is polynomial total time (but, unless surprise, the converse is not true). 

Not too surprisingly, our complexity results can be adapted to obtain polynomial, 
even linear, delay algorithms for acyclic queries as shown by the following corollary. 

Corollary 14 Generating all results of a F-ACQ^ (resp. F-ACQ, F-ACQ^, ACQf , ACQ, 
ACQ^) query can he done with a linear delay {and with linear space also). 

Proof. We proceed in a similar way as for Theorem 9. Results for relational query 
classes are obtained by reduction. Let be a functional structure and (/'(xi, . . . , Xfc) be 
a functional query in F-ACQ^, F-ACQ or F-ACQ+. 

The simple (recursive) algorithm below outputs all satisfying tuples of '^{x\^ . . . , Xfc). 

Algorithm 1 Eval(z, (/^(xj, . . . , Xfc), .7^, soZ) 
if i = A; + 1 then 

Output sol 
end if 

Ei ^ {xi D : {T, Xi) ^ Bxj+i . . . 3xk^} 
for a ^ Ei do 

sol <— {sol, a) 

(f ^ (f{xi/a,Xi+i, ... ,Xk) 

Eval(i + 1, (p, T, sol) 
end for 



Due to results of the preceding sections, computing Ei can be done, in all cases, in 
time /(|(/9|.|-D|). Then, running Eval{l,if{xi, . . . ,Xk),J^,9) generates all solutions sol in 
a depth-first manner with a linear delay detween each of them. It can be easily rewritten 
in a sequential way to use linear space. □ 

8 Fixed-parameter linearity of some natural problems 

In this part of the paper, the different kind of formulas introduced so far are used to 
define classical algorithm properties as query problems. This method provides a simple 
and uniform method to cope with the complexity of these problems. In all cases, the 
complexity bound found with this method reaches or improve the best bound known so 
far (at least in terms of data complexity). However, some of these problems have been 
the object of intensive researches and recent optimize ad-hoc algorithms (against which 
a general and uniform method can not compete) have better constant values. 
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8.1 Acyclic Subgraph problems 

Given two graphs G = {V;E) and H = {Vh',Eh), H is said to be a subgraph (resp. 
induced subgraph) of G if there is a one-to-one function g from Vh to V such that, for all 
u,v Vh, E{g{u), g{v)) if (resp. if and only if) E{u, v). Also, a graph G is of maximum 
degree d if none of its vertex belongs to more than d edges. This gives rise to the two 
following problems. 

ACYCLIC SUBGRAPH ISOMORPHISM (A. S.I.) 

Input: an acyclic graph H and a graph G 
Parameter: \H\. 

Question: is -fT a subgraph of G ? 

ACYCLIC INDUCED SUBGRAPH ISOMORPHISM (A. I. S.I.) 

Input: an acyclic graph H and a graph G of maximum degree d 
Parameter: \H\,d. 

Question: is H an induced subgraph of G ? 

The treewidth of a graph G is the maximal size of a node in a tree decomposition 
of G. In [PV90] it is proved that for graphs H of treewidth at most w, testing is H 
is a subgraph (resp. induced subgraph) of G can be done in time /(|i/|).|G|"'^^ (resp. 
f{\H\,d).\G\'^~^^). For the particular case of acyclic graphs (which have tree width 1), 
the bounds given in [PV90] can be improved. The following corollary is easily obtained 
from our results. 

Corollary 15 The two following results hold: 

• Problem A.S.l. can be solved in time f{\H\).\G\. 

• Problem A.l.S.l. can be solved in time f{\H\,d).\G\. 

Tor the two problems, generating all satisfying subgraphs can be done with a linear delay. 

Proof. We will express problem A.S.I, as a boolean ACQ^ query. Let G = {V;E), 
H = { Vh = {hi , . . . , /ifc}; Eh) be the two input graphs. Let Q be the following formula: 

Q = 3x1 . . . 3xfc : /\xi^XjA /\ E{xi,Xj) 

i,j<k E„{hi,hj) 

Since H is acyclic, formula Q defines an ACQ^ query whose size is linear in the size 
of the graph H. It is easily seen that Q is true in G if and only if it admits as a 
subgraph. The complexity bound follows from Corollary 11. 

For problem A.l.S.l. , let again G and H = { Vh = {xi , . . . , x^}] Eh) be the two inputs 
of the problem. Since ^ is of maximum degree d, we partition its vertex set V into d 
sets V^, . . . , where each contains vertex of degree a. This can be done in linear 
time from G. We proceed the same for graph H and obtain the sets V^, . . . , V^. In case 
there exists a vertex in H of degree greater than d, it can be concluded immediately 
that the problem has no solution. Now, let Q be the following formula: 
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Q = 3xi...3xfc: /\xiy^XjA /\ Vg{xi) A /\ E{xi,Xj). 

Formula Q simply check that is a subgraph of G and that each distinguished 
vertex Xi has the same degree than its associated vertex hi oi H. The size of Q is linear 
in the size of H and d. Again, Q defines a boolean ACQ^ query and the result follows 
again from Corollary 11. The bound on the linear delay comes from Corollary 14. □ 

8.2 Covering and matching problems 

MULTIDIMENSIONAL MATCHING 

Input: a set M C Xi x . . . x where the Xi are pairwise disjoints 
Parameter: r, k. 

Question: is there a subset M' C M with \M'\ = k, such that no two 
elements of M' agree in any coordinate ? 

Corollary 16 Problem multidimensional MATCHING can be solved in time Or^k{\M\). 

Proof. Let Fm = {M; /i, . . . , fr) where for all x = (xi, . . . , Xr) G M, it is set fi{x) = Xi. 
Then, there exists a multidimensional matching M' of M if and only if: 

J^M \= 3xi . . .3xk : /\ f\ fi{xj) / fi{xh) 

i<r l<j<h<k 

□ 

Corollary 17 below improves the bound of Of.^fc(jM|(log |M|)^) (reported in [DF99]) 
obtained by perfect hashing methods. A recent result however of [FKN"'"04] based 
on the color coding method of [AYZ95] gives a bound of 0(|M| + 2'^^'^)) for the r- 
MULTIDIMENSIONAL MATCHING problem. 

The following problems are also known to be fixed-parameter tractable [DF99]. 

UNIQUE HITTING SET 

Input: a set X and k subsets Xi, . . . , X^ of X. 
Parameter: k. 

Question: is there a set S" C X such that for alH, 1 < i < A;, |5 fl Xj| = 1 ? 

ANTICHAIN OF r-SUBSETS 

Input: a collection JT of r subsets of a set X, a positive integer k. 

Parameter: r, k. 

Question: are there k subsets Si, ■ ■ ■ , Sk ^ T such that Vi, j G {1, . . . , A;} 
with i j, both Si — Sj and Sj — Si are nonempty ? 

DISJOINT r-SUBSETS 

Input: a collection T oi r subsets of a set X, a positive integer k. 
Parameter: r, k. 

Question: are there k disjoint subsets oi J- ? 
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Corollary 17 Problems UNIQUE hitting SET, antichain OF r-SUBSETS and DISJOINT 
r-SUBSETS can be solved in time Or,k{\M\). In all cases, the respective sets of solutions can be 
generated with a linear delay. 

Proof. The following acyclic formula holds for the UNIQUE HITTING SET problem: 
ip = 3xi... 3xfc : f\ Xi{xi) /\ {xi / Xj -^Xj{xi)). 

i<k i<i<j<k 

The formulas are similar for the two other problems. □ 

9 Conclusion: summary of results and open problems 

The following array summarizes the main results of this paper. For all our classes of 
("classical", i.e., relational, or functional) queries we make use here of the notation ip 
for the query formula, S for the database, i.e., the input structure db or T, and f{S) 
for the result of the query, i.e., the output. 



Query Problems 


Complexity 


ACQi , ACQ+ , F-ACQi , F-ACQ+ 




ACQf , F-ACQf , F-FOf '2 


fm-\s\ 


ACQ, ACQ, F-ACQ, F-ACQ+ 


U\s\.\ipis)\ 


ACQ^,F-ACQ^ 


fm.\s\.Hs)\ 



Note that among those complexity results the only ones to be known before this paper 
(to our knowledge) where those concerning ACQ and ACQ^. 

We are convinced that (variants of) our technics of construction of minimal samples 
can be efficiently implemented to compute such queries. The reason is that we think 
that the total number of minimal samples should be very low in most databases. 

Finally, four lines of research are worthwhile to develop: 

• Generalize our complexity results to tractable or f.p. tractable tree-like queries 
e.g., queries of bounded tree-width (see [CROO, FFG02]) or of bounded hypertree- 
width ([GLS02]). Our reduction technic from relational to functional queries, 
which preserves acyclicity, may permit also to control the value of the tree-width 
when passing from one context to the other. 

• Apply our results to constraint satisfaction problems by using the now well-known 
correspondence between conjunctive query problems and constraint problems (see 
among others [KVOO]). 

• Enlarge the classes of tractable or f.p. tractable problems as much as possible, 
i.e., determine the frontier of tractability /intractability, and obtain for the (f.p.) 
tractable problems the best sequential or parallel algorithms; e.g., it is reasonable 
to conjecture that the ACQ"*" evaluation problem is highly parallelizable as it 
is known for ACQ (see for example [GLSOl] which proves that this problem is 
LOGCFL-complete). 

• Apply our methods to queries over tree-structured data (recall that a rooted tree 
can be seen as a graph of a unary function). 
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